Data Analysis

Defenders data

top 5 Goals

top 5 players most Yellow cards

top 5 players most Red cards

top 5 players most Tackle success %

top 5 players most Duels won

top 5 players most Duels lost

top 5 players most Aerial battles won

top 5 players most Aerial battles lost

top 5 players most Errors leading to goal

top 5 players most Passes per match

top 5 players most Duels won

top 5 players most Duels lost

top 5 players most Cross accuracy %

fwd_data

top 5 players most Goals

top 5 players most Yellow cards

top 5 players most Red cards

top 5 players most Fouls

top 5 players most Offsides

top 5 players most Losses

top 5 players most Passes per match

top 5 players most Shooting accuracy %

gk data

top 5 players most Wins

top 5 players most Losses

top 5 players most Saves

top 5 players most Penalties Saved

top 5 players most Catches

top 5 players most Errors leading to goal

top 5 players most Fouls

mid data set :

Analysis of Top 5 most valuable players:

    - Goals + assists 
    - Yellow/red card ratio (higher = the worse)
    - Fouls and offsides (higher = the worse)
    - Duel won/lost ratio 
    - Passes per match 
    - Through balls + Interceptions score  
    - Passes accuracy
    - Duel won/ lost ratio

top 5 players most Goals

top 5 players most Assists

top 5 players most Yellow cards

top 5 players most Red cards

top 5 players most Fouls

top 5 players most Offsides

top 5 players most Duels won

top 5 players most Duels lost

top 5 players most Passes per match

top 5 players most Through balls

top 5 players most Interceptions

top 5 players most Passes per match'

Data preprocessing:

for machine learning implementation we are using defenders data to predict goals conceded by the player

defining inputs and output as X and y respectively

Scaling input features

splitting the dataset into testing and training respectively

Modeling

Model 1:LinearRegression

Actual vs predicted values for linear regression

test set evaluations

Model 2:RandomForestRegressor

Actual vs predicted values for RandomForestRegressor

Model 3:SVR

Actual vs predicted values for SVR Regressor

Model 4:XGBRegressor

Actual vs predicted values for XGBRegressor

Model 5:DecisionTreeRegressor

Model 6:GradientBoostingRegressor

Model 7:BaggingRegressor

Evaluation for all the models on test data

MAE for all the models

R2square of all the models

Conclustion:

Linear regression is the best model for the prediction of goals conceded.

Linear regression has highest r2 square of 0.98.